The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...

method for obtaining a sequence of

random samples In statistics, quality assurance, and Statistical survey, survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a population (statistics), statistical population to estimate characteristics o ...

which

converge Converge may refer to: * Converge (band), American hardcore punk band * Converge (Baptist denomination), American national evangelical Baptist body * Limit (mathematics) * Converge ICT, internet service provider in the Philippines *CONVERGE CFD s ...

to being

distributed Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...

according to a target probability distribution for which direct sampling is difficult. This sequence can be used to estimate

integrals In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with d ...

with respect to the target distribution ( expected values). Hamiltonian Monte Carlo corresponds to an instance of the

Metropolis–Hastings algorithm In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This sequ ...

, with a Hamiltonian dynamics evolution simulated using a time-reversible and volume-preserving numerical integrator (typically the leapfrog integrator) to propose a move to a new point in the state space. Compared to using a Gaussian random walk proposal distribution in the Metropolis–Hastings algorithm, Hamiltonian Monte Carlo reduces the correlation between successive sampled states by proposing moves to distant states which maintain a high probability of acceptance due to the approximate energy conserving properties of the simulated Hamiltonian dynamic when using a

symplectic integrator In mathematics, a symplectic integrator (SI) is a numerical integration scheme for Hamiltonian systems. Symplectic integrators form the subclass of geometric integrators which, by definition, are canonical transformations. They are widely used in ...

. The reduced correlation means fewer Markov chain samples are needed to approximate integrals with respect to the target probability distribution for a given

Monte Carlo Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is ...

error. The algorithm was originally proposed by Simon Duane, Anthony Kennedy, Brian Pendleton and Duncan Roweth in 1987 for calculations in lattice quantum chromodynamics. In 1996, Radford M. Neal brought attention to the usefulness of the method for a broader class of statistical problems, in particular

artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...

s. The burden of having to supply

gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...

s of the respective densities delayed the wider adoption of the method in statistics and other quantitative disciplines, until in the mid-2010s the developers of Stan implemented HMC in combination with

automatic differentiation In mathematics and computer algebra, automatic differentiation (AD), also called algorithmic differentiation, computational differentiation, auto-differentiation, or simply autodiff, is a set of techniques to evaluate the derivative of a function s ...

Algorithm

Suppose the target distribution to sample is

f(\mathbf)

and a chain of samples

\mathbf_0,\mathbf_1,\mathbf_2,\ldots

is required. The

Hamilton's equations Hamiltonian mechanics emerged in 1833 as a reformulation of Lagrangian mechanics. Introduced by Sir William Rowan Hamilton, Hamiltonian mechanics replaces (generalized) velocities \dot q^i used in Lagrangian mechanics with (generalized) ''momenta ...

are :

\frac=\frac

and :

\dfrac=-\dfrac

where

x_i

and

p_i

are the

i

th component of the position and momentum vector respectively and

H

is the Hamiltonian. Let

M

be a mass matrix which is symmetric and positive definite, then the Hamiltonian is :

H(\mathbf,\mathbf)=U(\mathbf) + \dfrac\mathbf^\text M^\mathbf

where

U(\mathbf)

is the potential energy. The potential energy for a target is given as :

U(\mathbf) = -\ln f(\mathbf)

which comes from the Boltzmann's factor. The algorithm requires a positive integer for number of leap frog steps

L

and a positive number for the step size

\Delta t

. Suppose the chain is at

\mathbf_n=\mathbf_n

. Let

\mathbf_n(0) = \mathbf_n

. First, a

random In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual ra ...

Gaussian momentum

\mathbf_n(0)

is drawn from

\text\left(\mathbf,M\right)

. Next, the particle will run under Hamiltonian dynamics for time

L\Delta t

, this is done by solving the Hamilton's equations numerically using the leap frog algorithm. The position and momentum vectors after time

\Delta t

using the leap frog algorithm are :

\mathbf_n\left(t+\dfrac\right) = \mathbf_n(t) - \dfrac\nabla \left.U(\mathbf)\_

\mathbf_n(t+\Delta t) = \mathbf_n(t)+\Delta t M^\mathbf_n\left(t+\dfrac\right)

\mathbf_n(t+\Delta t) =
\mathbf_n\left(t+\dfrac\right)
-\dfrac\nabla \left.U(\mathbf)\_

These equations are to be applied to

\mathbf_n(0)

and

\mathbf_n(0)

L

times to obtain

\mathbf_n(L\Delta t)

and

\mathbf_n(L\Delta t)

. The leap frog algorithm is an approximate solution to the motion of non-interacting classical particles. If exact, the solution will never change the initial randomly-generated energy distribution, as energy is conserved for each particle in the presence of a classical potential energy field. In order to reach a thermodynamic equilibrium distribution, particles must have some sort of interaction with, for example, a surrounding heat bath, so that the entire system can take on different energies with probabilities according to the Boltzmann distribution. One way to move the system towards a thermodynamic equilibrium distribution is to change the state of the particles using the Metropolis–Hastings. So first, one applies the leap frog step, then a Metropolis-Hastings step. The transition from

\mathbf_n = \mathbf_n

\mathbf_

is :

\mathbf_, \mathbf_=\mathbf_n
= 
\begin
	\mathbf_(L\Delta t) & \text \alpha\left(\mathbf_(0),\mathbf_(L\Delta t)\right) \\
	\mathbf_n(0) & \text
	\end

where :

\alpha\left(\mathbf_(0),\mathbf_(L\Delta t)\right)=
\text\left(
1,
\dfrac

\right)

This is repeated to obtain

\mathbf_,\mathbf_,\mathbf_,\ldots

No U-Turn Sampler

The No U-Turn Sampler (NUTS) is an extension by controlling

L

automatically. Tuning

L

is critical. For example in the one dimensional

\text(0,1/\sqrt)

case, the potential is

U(x)=kx^2/2

which corresponds to the potential of a

simple harmonic oscillator In mechanics and physics, simple harmonic motion (sometimes abbreviated ) is a special type of periodic motion of a body resulting from a dynamic equilibrium between an inertial force, proportional to the acceleration of the body away from the ...

. For

L

too large, the particle will oscillate and thus waste computational time. For

L

too small, the particle will behave like a random walk. Loosely, NUTS runs the Hamiltonian dynamics both forwards and backwards in time randomly until a U-Turn condition is satisfied. When that happens, a random point from the path is chosen for the MCMC sample and the process is repeated from that new point. In detail, a binary tree is constructed to trace the path of the leap frog steps. To produce a MCMC sample, an iterative procedure is conducted. A slice variable

U_n\sim\text(0,\exp(-H mathbf_n(0),\mathbf_n(0))

is sampled. Let

\mathbf_n^+

and

\mathbf_n^+

be the position and momentum of the forward particle respectively. Similarly,

\mathbf_n^-

and

\mathbf_n^-

for the backward particle. In each iteration, the binary tree selects at random uniformly to move the forward particle forwards in time or the backward particle backwards in time. Also for each iteration, the number of leap frog steps increase by a factor of 2. For example, in the first iteration, the forward particle moves forwards in time using 1 leap frog step. In the next iteration, the backward particle moves backwards in time using 2 leap frog steps. The iterative procedure continues until the U-Turn condition is met, that is :

(\mathbf_n^+ - \mathbf_n^-)\cdot \mathbf_n^- < 0
\quad \text \quad
(\mathbf_n^+ - \mathbf_n^-)\cdot \mathbf_n^+ < 0

or when the Hamiltonian becomes inaccurate :

\exp\left H(\mathbf_n^,\mathbf_n^)+\delta\right U_n

or :

\exp\left H(\mathbf_n^,\mathbf_n^)+\delta\right U_n

where, for example,

\delta = 1000

. Once the U-Turn condition is met, the next MCMC sample,

\mathbf_

, is obtained by sampling uniformly the leap frog path traced out by the binary tree

\

which satisfies :

U_n<\exp\left H(\mathbf,\mathbf\right

This is usually satisfied if the remaining HMC parameters are sensible.

References

External links

* * {{cite web , title=Markov chain Monte Carlo , work=Statistical Rethinking 2022 , first=Richard , last=McElreath , date= , via=

YouTube YouTube is a global online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second mo ...

, url=https://www.youtube.com/watch?v=Qqz5AJjyugM&list=PLDcUM9US4XdMROZ57-OIRtIK0aOynbgZN&index=9
Hamiltonian Monte Carlo from scratchOptimization and Monte Carlo Methods
Monte Carlo methods Markov chain Monte Carlo

Algorithm

No U-Turn Sampler

See also

References

Further reading

External links